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Abstract 

Given a set of differential equations whose description involves unknown parameters, 
such as reaction constants in chemical kinetics, and supposing that one may at any time 
measure the values of some of the variables and possibly apply external inputs to help excite 
the system, how many experiments are sufficient in order to obtain all the information that is 
potentially available about the parameters? This paper shows that the best possible answer 
(assuming exact measurements) is 2r+l experiments, where r is the number of parameters. 



1 Introduction 

Suppose that we are given a set of differential equations whose description involves unknown 
parameters, such as reaction constants in chemical kinetics, resistances in electrical networks, 
or damping coefficients in mechanical systems. At any time, we may measure the value of 
some of the variables, or more generally of certain functions of the variables, and we may also 
apply external inputs to help excite the system so as to elicit more information. Measurements 
are assumed to be accurate, with no observation noise. We address the following question: 
how many experiments are sufficient in order to obtain all the information that is potentially 
available about the parameters? The main result is: 2r+l experiments, where r is the number 
of parameters. 

Questions of this type appear in many areas, and indeed the identification (or, when param- 
eters are though of as constant states, the observation) problem is one of the central topics in 
systems and control theory. However, the main motivation for this note arose from recent work 
on cell signaling pathways. In that field, and in contrast to the standard paradigm in control 
theory, it is not possible to apply arbitrary types of inputs to a system. Often inputs, such 
as growth factors or hormones, are restricted to be applied as steps of varying durations and 
amplitudes (or perhaps combinations of a small number of such steps), but seldom does one 
have the freedom assumed in control-theory studies, where for instance closure of the input class 
under arbitrary concatenations is needed in order to prove the basic theorems on observability. 

The "2r+l" expression appears often in geometry and dynamical systems theory. It is the 
embedding dimension in the "easy" version of Whitney's theorem on representing abstract man- 
ifolds as submanifolds of Euclidean space, and it is also the dimension in which r-dimensional 
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attractors are embedded, in Takens' classical work. In Aeyels' control theory papers, it is the 
number of samples needed for observability of generic r-dimensional dynamical systems with no 
inputs. Technically, our problem is quite different, as evidenced by the fact that, in contrast to 
these studies, which deal with smooth manifolds and systems, we require analytic dependence 
on parameters, and our main conclusion is false in the more general smooth case. Nonetheless, 
there are relationships among the topics, which we discuss in the paper. 

Let us start to make the problem precise. The system of interest will be assumed to have 
this form: 

z = f(z,u,x) (1) 
z(0) = X (x) (2) 
y = h(z,u,x) (3) 

where dot indicates derivative with respect to time, and (dz/dt)(t) = f(z(t),u(t),x) depends 
on a time-invariant parameter x and on the value of the external input u at time t. The internal 
state z of the system, as well as the external inputs, are vector functions, and the parameter 
x is a constant vector (we prefer to speak of "a vector parameter" as opposed to "a vector of 
parameters" so that we can later say "two parameters" when referring to two such vectors). 
The measurements at time t are represented by y(t) = h(z(t),u(t), x); typically, y does not 
depend directly on inputs nor parameters, and is simply a subset of the state variables z, i.e. 
the function h is a projection. We suppose that the initial state is also parametrized, by a 
function \- One more item is added to the specification of the given system: a class of inputs 
U, meaning a set of functions of time into some set U. All definitions will be with regard to 
time functions u(t) in this set. Before providing more details, let us discuss an example. 



1.1 An Example 

As an illustration, we take the following system: 

E m 

M = aM , E = M — bE (4) 

1+ E m v ' 

for the production of an enzyme E and the corresponding messenger RNA M. The first term 
for M models repression, the positive Hill constant m (not necessarily an integer) specifying the 
strength of this negative feedback, and the positive constants a and b account for degradation. 



This is a (nondimensionalized) modification by Griffith [13] of the classical operon model of 
Goodwin pT|j ; see also the textbook Q, pp. 208 and 308, as well as the recent paper 
which describes other variations which are biologically more accurate. In (||), the state z is 
the vector (M,E). We assume first that there is no true external input to the system, so 
experiments consist of simply letting the system evolve from its initial state up to certain time 
T, and measuring M(T) at the end of the interval [0, T]. (That is, the measured quantity is 
the amount of RNA; currently gene arrays are used for that purpose.) Experiments differ only 
by their length T. (To fit in the general framework, where inputs are allowed, we may simply 
take the set U of input values to be, for instance, {0}, so that the set IA consists of just one 
function, the input u = 0.) We take as parameters for this problem the possible 5-dimensional 
positive vectors: 

x = (M ,E ,m,a,b) 



2 



which list the initial conditions as well as the three constants. Thus x( x ) 1S just the vector 
(Mo, Eq), and f(M, E, Mo,Eo,m, a, b) = ( jjgm — ^M, M — bE). As output function h, we take 
h(M, E) = M. It is not possible to identify all parameters in this example: for any positive 
numbers k and £, the parameter 

gives rise to the same output M(t) = 1. Our general theorem will imply that a set of mea- 
surements taken at a random choice of 2r+l = 11 instants is sufficient in order to distinguish 
between any two parameters which give rise to different output functions M{t). (Less than 
2r+l measurements may be enough in any given example; the 2r+l bound is a very general 
upper bound, which is best possible in the sense that for some systems, no less will do.) 

More interestingly, let us now suppose that in our experiments we can affect the degradation 
rate b. Specifically, suppose that another substance, which binds to (and hence neutralizes) the 
enzyme E, is added at a concentration u which we can choose, and mass action kinetics controls 
the binding. We pick units of u such that the new equations become 

M = aM, E = M -bE - uE . (5) 

1 + E m w 

The parameters are the same as before, and neither x ° r h need to be modified; the only change 
is that now f(M, E, Mo, Eq, m, a, b, u) = (j^-aM,M-bE-uE) because there is an explicit 
input in the system description. We will assume that the concentration u is kept constant at 
a value uq for the duration of the experiment, so that the set of possible experiments is now 
specified by two numbers instead of one: a pair (uq,T) consisting of the concentration uq > 
(so, uq = means no neutralization) and the length of the time interval T at the end of which 
we measure M(T). Here the set tl consists of all possible nonnegative constant mappings; 
[0, oo ) — > M; thus {uq,T) represents the input function u = uq used on the interval [0, T\. In 
this case, it turns out, every pair of distinct parameters is distinguishable (see Section || for 
the calculation), so the general theorem implies that a set of measurements taken at a random 
choice of 2r+l = 11 instants is sufficient in order to identify (Mq, Eo,m,a,b). As yet another 
variation of this example, one might be able, for instance, to change the concentration u(t) in 
a linearly increasing fashion: u(t) = uq + u\t, with no > and u\ > 0. Now the experiments 
are specified by triples (uq ,u\,T), where again T stands for the time at which we take the 
measurement, and U consists of all linear functions as above. 

1.2 The Main Result 

We will define an "analytically parametrized system" as one in which the system of differential 
equations, the initial state map x> an d the observation map h are all expressed as real-analytic 
functions of states, parameters, and inputs, and also inputs depend analytically on a finite 
number of parameters. Recall that real-analytic maps are those that may be represented locally 
about each point of their domain by a convergent power series. This includes most reasonable 
nonlinearities one may think of: rational functions, roots and exponents (as long as away 
from singularities), trigonometric functions, logarithms and exponentials, and so forth, but not 
switching discontinuities nor smooth "patchings" between discontinuous pieces. (The results 
are definitely not true if analyticity is relaxed to just infinite differentiability, as we shall show 
by counterexample.) 
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We will also define precisely what we mean by "a randomly chosen set of 2r+l experiments" 
and by "distinguishability" of parameters. All these definitions express the intuitive ideas 
conveyed by the terms, but are necessarily somewhat technical so we defer them until after the 
statement of our main theorem. 

Theorem 1 Assume given an analytically parametrized system, and let r be the dimension of 
its parameter space. Then, for any randomly chosen set of 2r+l experiments, the following 
property holds: for any two parameters that are distinguishable, one of the experiments in this 
set will distinguish them. 

Let us now make the terms precise. In order to properly define analytic maps, we need that 
states z(t), parameters x, input values u(t), and measurement values y(t) all belong to analytic 
manifolds. Examples of such manifolds are all open subsets of an Euclidean space, which are 
indeed the most common situation in applications; for instance, in biological applications states 
and parameters are usually given by vectors with strictly positive entries. But using manifolds 
more general than open subsets of Euclidean spaces allows one to model constraints (such as 
the fact that a state may be an angle, i.e. an element of the unit circle), and adds no complexity 
to the proof of the main result. We do not loose generality, however, in assuming that y(t) G W 
for some integer p, since every analytic manifold can be embedded in some Euclidean space. 

Formally, we define analytically parametrized systems as 9-tuples: 

E = (M,U,X,A,pJ, x ,h,n) 

where 

• the state-space M, input-value space U, parameter space X, and experiment space A are 
real-analytic manifolds, 

• p is a positive integer 

• / : M xU x X ^ TM (the tangent bundle of M), x:X->M,li:Mx!/xX-> MP, and 
fj, : A x R — ► U are real-analytic maps, and 

• / is a vector field on M, that is, f(z,u,x) £ T Z M for each (z,u,x). 

An additional technical assumption, completeness, will be made below. 
Given a system E, we associate to it its response, the mapping 

#g : XxA^l p 

obtained, for each system parameter x 6 X and experiment A € A, by solving the initial-value 
problem (0)-((2|) with the input u(t) = fi(X,t), and then evaluating the output (H) at the final 
time t = 1 (We are arbitarily normalizing the time interval to [0, 1], but varying lengths can 
be easily incorporated into the formalism, as we discuss later.) Thus, the class of inputs U is 
the set of all maps 1 1— > fi(X,t). More precisely: we consider the solution z(-) of the differential 
equation {dz/dt){t) = g(t,z(t)) with initial condition z(0) = x( x )i where g is the time-varying 
vector field defined by g(t,() = /(£, /i(A, i), x). Since /(£, /x(A, t), x) depends analytically on t, 
C, A, and x, such a solution exists at least for all t ~ 0, and it depends analytically on A and x. 
(This is a standard fact about differential equations; see e.g. Proposition C.3.12 in [p5[ , viewing 
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parameters as constant additional states, and note that the manifold case follows easily from 
the Euclidean case.) We make the following completeness assumption: the solution z exists on 
the interval [0,1]. Now we define A) := h(z(l), /i(A, 1), x). Thus, /?£ is a real-analytic 

function. 

We define two parameters x\ and xi to be distinguishable if there exists some experiment A 
which distinguishes between them: 



We next discuss the notion of "random" set of 2r+l experiments. In general, we will say that 
a property holds generically (often the term "residual" is used for this concept) on a topological 
space S if the set of elements Sq C S for which the property holds contains the intersection of a 
countable family of open dense sets. Such sets are "large" in a topological sense, and for all the 
spaces that we consider, the "Baire property" holds: generic subsets Sq are dense in S. When 



S is a manifold, there is a well-defined concept of measure zero subset, see e.g. [[R], 14]; in that 



case we will say that a subset Sq C S has full measure if its complement S \ Sq has measure 



zero. (See e.g. [17| for a comparison of these two alternative concepts of "large" subset of S. For 
an extension to infinite dimensional spaces of the notion of full measure, called "prevalence," 
see 



15].) We will show that sets of 2r+l experiments which distinguish are both generic and of 
full measure. 

In general, for any set M and any positive integer s, as in [10] we denote by the subset 



of M s made up of all sequences (£i, ••-,£«) consisting of distinct elements (£, ^ £j for each 
i ^ j). Then, we say that a property (P) of q-element sets of experiments holds for a random 
set of q experiments, where q is a positive integer, provided that 

G q = |(Ai, A2, . . . , X q ) I (P) holds for the set {Ai, A 2 , . . . , A g }| 

is generic and of full measure in . Now all the terms in the statement of Theorem |] have 
been defined. 



Remark 1.1 In applications, often we may not want to restrict the space A which parametrizes 
experiments to have to be a manifold. For instance, in the examples in Section |1.1| , we took 
constant or linear nonnegative inputs, meaning that A = [0, 00) or [0, oo) 2 respectively. Such 
generalizations present no difficulty, and can be handled by the theory in several alternative 
ways. The simplest is just to pick a different parametrization of inputs. For example, we may 
write constant nonnegative inputs as Uq, with uq G A = R and linear nonnegative inputs as 
Uq + lift, with (uq,ui) £ A = I 2 . Another possibility is to use a smaller A which is dense in 
the original one; for instance, we may use A = (0, 00) for the constant input case, and note 
that, by continuity of (3s(x, A) on A, distinguishability is not affected when using such restricted 
experiments. Finally, the best alternative would be to prove the theorem for more general sets 
A (and X too), namely arbitrary "subanalytic" subsets of analytic manifolds; we did not do so 
in order to keep the presentation as simple as possible. □ 



Remark 1.2 We normalized the time interval to [0,1]. However, we can easily include the 
case in which observations may be performed on the system at different times. We simply 
view measurements taken at different times as different experiments, adding the final time T 
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as a coordinate to the specification of experiments, just as we did in the examples discussed in 
Section 1.1. Formally, we define the map 



: X x A x (0, oo) -»■ R p : (x, A, T) ^ h(z(T),fi(X, T),x) 

and say that two parameters x\ and x<i are distinguishable on varying intervals if there exist 
A £ A and T > such that /^(xi, A, T) / /3f;(x2> A, T). Theorem [l] gives as a corollary that, for 
a random set of pairs of the form {(Ai, T\), (A2, I*?), . . . , (A2r+i, 2~2 r+ i)}, if any two parameters 
x\ and X2 are distinguishable on varying intervals, then /^(xi, Aj, Tj) 7^ /?|;(x2, Aj, Tj) for at 
least one of the 2r+l pairs in this set. To prove this corollary, we introduce a new set of 
experiments A := A x (0, 00), and a new analytically parametrized system 

Z=(M,U,xXpJ,X,h,]Z) 

with the property that (3j,(x,X,T) = /fc(x, (A, T)) for all (x,A,T). Clearly, the desired con- 
clusion follows for £ when the theorem is applied to S. To define S, we note that we must 
have 

h(z(T),fi(X,T),x) = h(z(l),Jl((X,T),l),x) (6) 

for all (x,X,T), where z solves (dz/dt)(t) = f(z,Jl((X,T),t),x) with initial condition z(0) = 
x(x)- This can be accomplished by reparametrizing time according to the experiment duration: 

U := (0, 00) x U, /I((A, T),t) := (T, M (A, Tt)), 

f(z, (uo,u),x) := u f(z, u, x), h(z, (u ,u),x) := h(z, u, x) . 
It is easy to verify that (|6|) holds with these choices. □ 



The rest of this note is organized as follows. Section || studies the more abstract problem 
of distinguishability for "response" maps (3 which do not necessarily arise from systems S, and 
presents a general 2r + 1 theorem for such responses. That result already appeared, expressed 
in a slightly different manner, as a technical step in [|23[|, so we do not include all the details of 
the proof, and in particular the technical material on analytic functions, which can be found 
in that reference. Also in Section [2], we present results for merely smooth mappings: one 
showing that r experiments are enough for local identification, and another one on genericity. 
Section ^ specializes to responses linear in inputs, a class of responses which is of interest 
because of relations to Whitney's embedding theorem and data reduction (cf. Section |6lf ) , and, 



in Section 3.1, we provide a nontrivial example in this class showing that the bound 2r+l cannot 
be improved in the analytic case. Section || shows how the results apply to the system case, 
proving Theorem [l|. Section [| completes the discussion of the second example in Section LI 



showing that all pairs of parameters are distinguishable. Finally, we close with Section [6|, where 
we provide general comments and discuss relations to other work. 



2 Parameter Distinguishability for Maps 

In this section, we consider maps, which we call responses 

(3 : X x U -> E 
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where X and U are two differentiable (connected, second countable) manifolds. We may view a 
map P(x, •) : U — > R, for each fixed (vector) "parameter" x, as the (scalar) response of a system 
to "inputs" in U (in the application to systems, these will be elements in A which parametrize 
continuous-time inputs). In typical applications, X is an open subset of an Euclidean space W 
and U is an open subset of some M m . In general, we call X the parameter space and U the input 
space, and use respectively r and m to denote their dimensions. 

The results of most interest here will be those in which (3 is a (real-) analytic map (that is, (3 
may be represented by a locally convergent power series around each point in X x U), in which 
case we assume implicitly that X and U are analytic manifolds. However, we also will make 
some remarks which apply to the more general cases when (3 is merely smooth, i.e. infinitely 
differentiable (and X and U are smooth manifolds), or even just continuously differentiable. 

Given a response (3 and any subset of inputs Uo C U, two parameters x\ and X2 are said to 
be indistinguishable by inputs in Uo, and we write 

Xl ~ X 2 , 

if 

f3{x\,u) = f3(x2,u) VuGUo- 

If this property holds with Uo = U, we write x\ ~ X2 and simply say that x\ and X2 are 
indistinguishable; this means that j3{x\,u) = (3(xi,u) for all u G U: the response is the same, 
for all possible inputs, whether the parameter is x\ or X2- If, instead, there exists some u G U 
such that (3(xx,u) ^ f3(x\,u), we say that x\ and X2 are distinguishable, and write x\ ^ X2- 

We say that a set Uo is a universal distinguishing set if 

Xl ~ x 2 <S=^ Xl ~ x 2 

which means that if two parameters can be distinguished at all, then they can be distinguished 
on the basis of inputs taken from the subset Uo alone. 

A useful notation is as follows. For each fixed positive integer q, we extend the function 
/3:XxU^Mtoa function 

/ (3(x,ui)\ 

^ ? : Xx0 9 ^R 9 : (j,ui,...,h 9 )w : 

\(3(x,U q ) / 

and, with some abuse of notation, we drop the subscript q when clear from the context. Then, 
saying that the finite subset Uo = {ui, . . . , u q } is a universal distinguishing set amounts to the 
following property holding for all xi,X2 G X: 

(3u G U) {(3(xi,u) / (3{x2,u)) (3(xi,ui, . . . ,u q ) ^ (3(x 2 ,ui, . . . ,u q ) . (7) 



2.1 Global Analytic Case: 2r+l Experiments are Enough 

We now turn to the main theorem of this section. It states that 2r+l (recall r = dimX) 
experiments are sufficient for distinguishability, and, moreover, that a random set of 2r+l 
inputs is good enough, if (3 is analytic. Later, we show that the bound 2r+l is best possible. 
This theorem was already proved, using slightly different terminology, in [23]. We provide here 
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the outline of the proof, but for the main technical step, concerning real-analytic manifolds, we 



will refer the reader to [23]. 



We express our result in terms of the set Qa q consisting of those ("good") sequences 
(til, . . . ,Uq) G which give rise, for a given response /3, to universal distinguishing sets 
Uo = {iti, • • • ,u q } of cardinality q, and its complement U^) \ Q/3, q , the ("bad") sequences: 

B/3,q '■= \w G | 3x\,X2 G X s.t. x\ 9^ X2 and (3(xi,w) = (3(x 2 ,w)^ . (8) 

Theorem 2 Assume that (3 is analytic. Then the set Bp^r+i is a, countable union of embedded 
analytic submanifolds o/U 2r+1 of positive codimension. In particular, Q/3 :2r +i is generic and of 
full measure in 

We give the proof after establishing a general technical fact. For each fixed positive integer 
q, we introduce the following set: 

^I3, q ■= { ((si, x 2 ), «?)) G X< 2 > x U (<?) \x!^x 2 and P(x u w) = /3(x 2 ,w)} . (9) 

This is a "thin" (possibly empty) subset of X^ 2 ) x U W : 

Lemma 2.1 The set Vp q is a countable union of submanifolds of X^ 2 ) x \j( q > each of which has 
dimension < qm + 2r — q. 



Proof. The proof is included in 23]; let us recall the main steps. We first consider the set of 



pairs of parameters which can be distinguished from each other: 

X : = {(si, x 2 ) eX< 2 ' |xi^x 2 } . (10) 

If this set is empty then Vp, q is also empty, and we are done. Otherwise, X is an open subset of 
X^ 2 ), and hence an open subset of X 2 , and is thus a manifold of dimension 2r. We let U(xi,x 2 ) 
be the set consisting of those u G U such that xi~X2- If (x\,x 2 ) G X, then U(x\,x 2 ) is 
an analytic subset (a set defined by zeroes of analytic functions) of U of dimension at most 
m — 1, since it is the set where the nonzero analytic function (3{xi,u)—(3(x 2 ,u) vanishes, and 
U is connected. Therefore, its Cartesian product (14 (x\, x 2 )) q is an analytic subset of U 9 of 
dimension at most q(m — 1) (Proposition A. 2, Part 3, in |]23|j ). Next, for each (xi,x 2 ) G X, we 
consider the following subset of V q : 

T(x u x 2 ) = {w G U {9) | f3(xi, w) = P(x 2 ,w)} . 

Clearly, T(xi,x 2 ) is a semianalytic subset, i.e. a set defined by analytic equalities (responses 
are equal) and inequalities (the coordinates of w are distinct). The key point is that T(xi,x 2 ) 
has dimension at most q(m — 1), because it is a subset of (U(x\, x 2 )) q . 

The set Vp^q is also semianalytic. Let tt\ : X^ 2 ^ x — > X^ 2 ^ be the projection on the 
first factor. For each (x\,x 2 ) G X, tt^ 1 (xi, x 2 ) f] V$ %q = T(x%,x 2 ) has dimension at most 
q(m — 1). Applying then Proposition A. 2, Part 2, in |23f| , and using that X is an analytic 
manifold of dimension 2r, it follows that dimP^ q < 2r + q(m — 1) = 2r + qm — q. It is known 
from stratification theory (cf. J3[ ||, ffil, and the summary in the Appendix to (2^]) that any 
semianalytic set is a countable union of embedded analytic submanifolds, so the Lemma is 
proved. I 
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Proof of Theorem § 



Letting; — * be the projection on the second factor, we have that irzCPpq) = 
Bp t q. Again from stratification theory, we know that the image under an analytic map of a 
countable union of (embedded, analytic) submanifolds of dimension < p is again a countable 
union of submanifolds of dimension < p. In the particular case that q = 2r+l, and applying 
Lemma 2.1, this means that Bp^r+i is a countable union of submanifolds Mj of U 2r+1 of 
dimension < p, where p = (2r+l)m + 2r — (2r+l) = (2r+l)m — 1. Since dim U 2r+1 = (2r+l)m, 
each Mj has positive codimension. Embedded submanifolds are, in local coordinates, proper 
linear subspaces (and one may cover them by countably many charts), so the complements of 
each Mi are generic and of full measure, from which it follows that Gp2r+\ is generic and of full 
measure, as wanted. I 



2.2 Local Case: r Experiments are Enough 

If only distinguishability of parameters near a given parameter is desired, then r, rather than 
2r+l, experiments are sufficient, at least for a generic subset of parameters. To make this fact 
precise, we introduce local versions of the sets of "good" and "bad" inputs. For each open 
subset V C X, we let 

B/3,q,v '■= 1^ G | 3x\,X2 G V s.t. x\ X2 and (3(x\,w) = 0(x2,w)\ 

and G/3, q ,v = \ B/3 jq y. When V = X, these are Bp^ q and Gp,q respectively. 

Proposition 2.2 Assume that (3(-,u) is continuously differentiable for each u G U. Then there 
exists an open dense subset Xo of X with the following property: for each x G Xo there is some 
neighborhood V of x such that Qp r y is nonempty. 

To show the proposition, we first introduce the notion of a nonsingular parameter. For each 
x G X, each positive integer q, and each w G \j( q \ we define p(x,w) to be the rank of the 
differential of (3 q (^,w) with respect to £, evaluated at £ = x, and for each x G X define 

p(x) := max |p(x, w) \ w G U^, g > l| . 

Observe that the maximum is always achieved at some w since if p(x, tti, . . . , u p ) = q 

then there must exist some ^-element subset of {u±, . . . , u p } for which the rank is also q. We will 
say that a parameter value x is nonsingular provided that this maximal rank is locally constant 
at x, that is, there is some neighborhood V of x in X such that = p(x) for every t; £ V. 
The set Xns of nonsingular parameter values is an open and dense subset of X. Openness is 
clear from the definition and density follows from this argument: suppose there would exist an 
open subset W C X with P^P|Xns = 0; now pick a point x G W at which p(x) is maximal 
with respect to points in W; since (3 is continuously differentiable on x, there is a neighborhood 
of x in W where the rank is at least equal to p(x), and thus, by maximality, it is equal to 
p(x); so x G X N s, a contradiction. Proposition ^2] is then a consequence of the following, using 
Xo = X N s, and because p(x) < r for all x: 

Lemma 2.3 Pick any nonsingular parameter value x. Then there is some open neighborhood 
V of x such that Gp tP ( x ),v 
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Proof. We must prove that there is an open neighborhood V of x and a subset Uo of U of 
cardinality q = p(x) such that Uq is a universal distinguishing set for parameters in V, that is, 



xi ~ %i and xi, Xi G V X\ ~ xo ■ 

If w = (ux, ... ,Uq) G U«' is such that g = = p(x,w), a routine functional dependence 
argument implies that Uo = {u±, . . . ,u q } has the desired properties, as follows. We pick a 
neighborhood V of x so that, for any other u G U, the rank of (3 q+ i(£, w, u) with respect to £ is 
also constantly equal to q on 1/ (nonsingularity of x). Consider the mapping 

K : V —>R q : • 

Since the rank of the differential of -fT is constant on V, by the Rank Theorem and after shrinking 
V if needed, we know that there exist two diffeomorphisms 5 : W — > V and T : — > R 9 , 
where W is the image K(V), such that T o K o S 1 : R r — > R 9 is the canonical projection 
(zi, . . . , z r ) — y (zi, . . . , z g ) on the first q coordinates. 

Now pick any other u G U, introduce 



and let F : W — ► be the map obtained as the composition (T x I) o H o S, where T x I 
maps (a, b) (T(a),b). Since S : W — > V and (T x I) : — * R 9 are diffeomorphisms and 
-ff has constant rank, also F has constant rank g. So, since the Jacobian of F has the block 
form ^ where * is irrelevant and M is the Jacobian of f3(S(-),u) with respect to the 
variables z g+ ±, . . . ,z r , it follows that M = 0. In other words, (3(S(-),u) must be independent 
of these variables, so there exists a cp : — > R such that (3(S(z),u) = (p(zi, . . . , z q ) for all 
z = (z\, . . . , z r ) G R r . Since T o K o S is the projection on the first q variables, this means 
that P(S(z),u) = tp((T oKo S)(z)) = tp(T(p(S(z),w))) for all z G R' r . As S is onto, this is 
equivalent to: 

P(£,u) = <p(T(P(£,w))) 

for every ^ G V. Now let us suppose that x% ~ x%. By definition, [3(x\,w) = (3(x2,w), and this 
in turn implies 

(3{xi,u) = tp(T(0(xi,w))) = ip{T(f3(x 2 ,w))) = (i{x 2 ,u) . 
As u was arbitrary, we conclude x\ ~ x%- ■ 



Remark 2.4 In general, Xo must be a proper subset of X. A counterexample to equality 
would be provided by (3(x, u) = f(x) -u (dot indicates inner product, see Section || for responses 
of this special form), where x G R and / : R — ► R 2 parametrizes a figure-8 curve. Around 
the parameter value that corresponds to the crossing, no single u will distinguish. (We omit 
details.) □ 

2.3 Smooth Case: 2r+l Experiments are Enough, for Generic (3 

For infinitely differentiable but non-analytic (3, the result on existence of universal distinguishing 
subsets of cardinality 2r+l does not generalize. As a matter of fact, one can exhibit a /3 of 
class C°°, with r = 1, with the property that there is no finite universal distinguishing set: take 
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X = U = (0, oo) and f3(x, u) := j(x — u), where 7 is a smooth map which is nonzero on (—00, 0) 
and zero elsewhere. For this example, every two parameters are distinguishable (if x 7^ y then 
picking u := (x + y)/2 results in j3(x, u) 7^ = f3(y, u)). To see that there is no finite universal 
distinguishing set, suppose that Uo C U is such a set; then picking x,y > max{u | u G Uo} 
we have that (3{x,u) = = /3(y,u) for all u G Uo, a contradiction. Observe that lack of 
compactness is not the reason that finite distinguishing subsets fail to exist in the non-analytic 
case. Let us sketch an example with X = S 1 and U = (—it, it). For each ip G U, we take /?(•, ip) 
to be a "bump function" centered at e tip , with value = 1 there and less than one elsewhere, 
and supported on a compact set which does not contain x = 1. (Such a choice can be made 
smoothly on x and u simultaneously.) If x 7^ y are elements of S , we can always distinguish 
them by an appropriate u (if x 7^ 1, we may take (p to be the argument of x, so that f3(x,u) = 1 
and f3(y,u) < 1, and if x = 1 and y 7^ 1, we may take ip to be the argument of y). However, 
given any finite set of u's, the union of the supports of the corresponding bump functions is still 
a compact subset of S 1 which does not include 1, so there is some x / 1 such that (5{x,u) = 
for all u in this set, and hence x is not distinguishable from 1. 

It is possible, however, to provide a result that holds generically on C°°(X x U, R), the set of 
smooth (that is, infinitely differentiable) responses (3 : X x U — > R, endowed with the Whitney 
topology. (Density in the Whitney topology means approximability in a very strong sense; 
see e.g. |u], [lj] for details.) Generic results in this sense are not terribly interesting, and are 
fairly meaningless in applications, since a response "close" to a given one may not have any 
interesting structure, but we present the result nonetheless for completeness. The analogue of 
Theorem ^ is as follows. 

Proposition 2.5 For generic (3 G C°°(X xU, R), Gp^r+i is generic and of full measure in U 2r+1 . 



We need this analog of Lemma 2.1: 



Lemma 2.6 For generic G C°°(X x U, R), Vp <q is a submanifold of X^ 2 ) x Uw of dimension 
< qm + 2r — q. 



Let us show first how Proposition |2.5| follows from here. In the particular case q = 2r+l, the 
Lemma gives that dim < (2r + l)m - 1. The projection vr 2 : X^ 2 ) x -> restricts to 
a smooth map / : Vp A -» U( ? ) with image f(V/3 )q ) = Bp tq . In general, the Morse-Sard Theorem 



(as stated e.g. in [14], Theorem 3.1.3) says that if / : M — ► ./V is smooth then N \ /(Sj) is 
generic and has full measure, where Sj is the set of critical points of / (differential is not onto). 
In our case, dim 7% < (2r+l)m = dimU^), soS ; = Vp >q . Thus Qp^r+i = U (g) \ f(Vp >q ) is 
generic and has full measure, as wanted. I 



Lemma 2J3 is, in turn, an immediate consequence of the following fact, because Vp yq is an 
open subset of Vp >q - 

Proposition 2.7 Fix any positive integer q. Then, for generic (3 G C°°(X x U, R), the set 

Vp, q := { ((xi, x 2 ), wfj G X( 2 ) x U(«) I 0( Xl ,w) = /3(x 2 ,w)} . (11) 
is either empty or it is a submanifold of X^ 2 ) x U^- 1 of dimension qm + 2r — q. 
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Proof. The proof is a routine exercise in transversality theory. We begin by recalling the 



Multijet Transversality Theorem (see [10] for details), in the special case of jets of order zero. 
(The general case for arbitrary orders, which we do not need here, would require the careful 
definition of jets of functions on manifolds, which in turn requires a more complicated quotient 
space construction.) The theorem states that, given: 

• any two smooth manifolds M and N, 

• any positive integer s, and 

• any submanifold W of M s x N s , 

then, for generic (3 G C°°(M, N), it holds that the s-fold 0-prolongation j®/3 of (3 is transversal^ 
to W, where 

jO/3: M^^M s xN s : . . . » . . . . . . P(Q) . 

Thus, since preimages of submanifolds under transversal maps are submanifolds of the same 
codimension, a generic (3 G C°°(M,N) has the property that {j®f3)~ l {W) is either empty or it 
is a submanifold of of codimension (that is, dimM^ — dim W) equal to the codimension 
of W (that is, dirnM 5 x N s - dimW). 

A typical application of this theorem is in the context of the Whitney embedding theorem, 
as it implies that the set of one-to-one smooth mappings from M to N is generic, provided 
that dimiV > 2dimM. To see this, one just takes s = 2 and W = {(£i, £2, Ci> C2) I Ci = C2}, 
which has codimension equal to dimiV. Then W = {j®{3)~ l (W) Q is the set of pairs 

£1 7^ £,2 such that = 0(^2), and this set must be empty, since otherwise its codimension 

would be larger than dimM^ 2 ) = 2dimM, which is nonsense. (See e.g. [10, 14, 15, 20] for such 



arguments.) The application to our result is very similar, and proceeds as follows. 

Pick M = X x U, N = M, and s = 2q. To define W, we write the coordinates in M s , and 
in particular in the subset M^ s \ in the following form: 

((21, ui), (x 2 ,u 2 ), • • • , {x q ,u q ), (yx,vi), (y2,v 2 ), ■■ ■ , (y q ,v q )) (12) 
and let W consist of those elements 

((xi,Ui), (x 2 ,U 2 ), • • • , (Xq,U q ), (yi,Ul), (y 2 ,V 2 ), . . . , (y q ,V q ),Wi,.. .,W q ,Zi,.. .,Z q ) 

in M s x R s such that: 

sci =x 2 = ... =x g , y 1 = y 2 = . . . = y q , u x = v x , u 2 = v 2 , . . . , u q = v q , (13) 

and Wi = Zi for all i = 1, . . . , q. Counting equations, it is clear that W is a submanifold (linear 
subspace in the obvious local coordinates) of codimension p := 2r{q — 1) + mq + q. 

For generic (3, multijet transversality insures that Q = {j®(3)~ l (W) is either empty or it is 
a submanifold of of codimension p. The set Q consists of all sequences (of distinct pairs) 
as in (12) for which all the equalities (|l^) hold as well as 



(3{xi,ui) = (3(yi,ui), (3{x 2 ,u 2 ) = (3(y 2 ,u 2 ), . ../3(x q ,u q ) = (3(y q ,u q ) 



'Recall that transversality of / : P — > Q to a submanifold W of Q denoted / ftl W, means that Df x (T x P) + 
Tf(x)W = Tf( x )Q for all x such that f(x) £ W. All that we need here is the conclusion on preimages. 
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Now we introduce the function $ : X( 2 ) x U^ -» (X x U)( 2 ^ that maps 

((x, y), itg)) i ^ (re, tti), (x, u 2 ), ...,(x, u q ), (y, (y, u 2 ), . . . , (y, u q ) 

and notice that $ establishes a diffeomorphism between Vp tq and its image Q. Thus, Vp iq is 
either empty or is a submanifold of dimension 2q{m + r) — p = qm + 2r — q, as claimed. I 



3 A Special Case: Responses Linear in Inputs 

A very special case of our setup is that in which the input set is an Euclidean space: U = W 71 , 
and (3 depends linearly on the inputs u. Then we may write: 

(5{x,u) = f(x)-u (14) 

(dot indicates inner product in U), where / is some mapping X —* M. m . We will denote by 
S C R m the image /(X) of /. In this special case, we have that two parameters x\ and x 2 are 
indistinguishable by inputs belonging to a given subset Uo C W 71 (xi ~ x 2 ) if and only if 

f(xi) - f(x 2 ) G 

(Uq = {a G M m | a ■ u = OVu G Uo} is the orthogonal complement of Uo), and x\ and x 2 
are indistinguishable (case Uo = U) if and only if j{x\) = f(x 2 ). Therefore, a subset Uo is a 
universal distinguishing set if and only if (S — S) f] Uq = {0}, i.e. 

a,b £ S , a - b £Uq =>- a = b. 
Let us introduce the set sec(S') consisting of all unit secants of S: 

sec(S) := I — , a ^ b, a, b G Sf 

{\a-b\ J 

as well as the set u(U L ) of unit vectors in Uq. These are both subsets of the (m— l)-dimensional 
unit sphere S m_1 . With these notations, we can say that a subset Uo is a universal distinguishing 
set if and only if sec(S) f] u(U L ) = 0, or equivalently, if and only if 

u^) C S" 1 - 1 \ sec(,S) . (15) 

Any basis of U provides a universal distinguishing set of cardinality m (since then u(U L ) = 0). 
On the other hand: 

Proposition 3.1 There is a universal distinguishing set of cardinality m — 1 if and only if 

sec(,S) + S m -\ 

Proof. If there is such a Uo with less than m elements, then Uq / {0}, so also u(U L ) 7^ 0, and 
then ( |T5| ) gives that sec(S) ^ S m_1 . Conversely, if there is any u G S m_1 \ sec(S'), we may let 
Uo be any basis of {u} ± , so that ^Uq 1 ) = {±u}. Since also — u G S m_1 \ sec(S'), we have that 
^Uq 1 ) G § m_1 \ sec(S'), and thus Uq is a universal distinguishing set of cardinality m — 1. I 
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3.1 Examples 

We will provide examples of two subsets 5C1 2 and 1Z C R 3 , both images of analytic maps 
defined on R, with the properties that S — S = R 2 and sec(lZ) = § 2 . These examples will allow 
us to show that the 2r+l bound is best possible. We need this first: 

Lemma 3.2 Pick any real a > and any nonnegative integer k, and let 

f(x) = (x + a)sinx (16) 

for x € R. Then, there exists an 

M k £ ((2k + 1/2)tt, (2A; + 1)tt) 

and a continuous map 

q: [2fevr,M fe ] - [M k , (2k + 1)tt] 

such that 

a(2/cvr) = (2fc + 1)tt, a(M k ) = M k , and /(a(x)) = /(x) Vx G [2/ctt, M fe ] . 

Proof. Consider the restriction of the function / to the interval [2kir, (2k + and observe 
that its derivative f'(x) = sinx + (x + a)cosx is positive for 2kn < x < (2k + l/2)7r, has 
f'((2k + 1)tt) = ((2fc + 1)tt + a) • (-1) < 0, and, for (2k + l/2)vr < x < (2k + 1)tt, /'(x) = is 
equivalent to 

tanx = —x — a , 

which happens at a unique x = M k 6 ((2A; + l/2)7r, (2/c + l)7r) (clear from the graph of tanx 
and from the fact that the graph of — x — a is in the fourth quadrant). Therefore, on the interval 
[2kir, (2k + 1)tt], f is strictly increasing on [2kir, M k ] and strictly decreasing on [M k , (2k + 1)tt]. 
Let fi and /2 be the restrictions of / to [2kn, M k ] and [M k , (2k + l)ir] respectively, and let 

9-=f 2 1 ■ [0,/(M fc )] [M fc , (2A; + l)ir] 

(so 5 is a strictly decreasing continuous function). Finally, let a :=gof 1 . Thus a is a continuous 
function and it satisfies that a(2/c7r) = (2/c + l)ir and a(M^) = by construction. Finally, 

f(a(x)) = f 2 (a(x)) = f 2 (f 2 1 (h(x))) = h(x) = f(x) 

for all x G [2fc7r, Mfc], as desired. I 

Lemma 3.3 Consider the spiral S = {( £ C | C = re lT ,r > 0}. Then, for each complex z G C 
there exist two elements Cii C2 £ 5 such that Ci — C2 = -2- That is, as a subset of M 2 , we have 
5 — 5 = IR 2 . 

Proof. The idea of the proof is very simple: we first find some "chord" between two points a 
and b such that the difference b — a is a multiple of the desired z, and its modulus is larger than 
that of z; then, we displace it in an orthogonal direction, until the resulting chord has the right 
length. Analytically, we proceed as follows. 
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Let z = re J</3 , with r > and tp > 0. Without loss of generality, we assume that r > (if 
z = 0, we just take zi = 212 to be any element of 5), and tp > 0, and pick any positive integer A; 
such that 99 + 2/c7r > |. Our goal is, thus, to show that there are reals s, t such that 

which is equivalent to asking that 

t sin(i — tp) = s sin(s — tp) (17) 

and 

t cos(t — 93) — s cos(s — </>) = r. (18) 

By Lemma |3.2| , applied with a = tp and the chosen k, there exists Mfc E ((2fc + l/2)n, (2k + l)ir) 
and a continuous a : [2kir, M%\ — ► [Mfc, (2/c + l)-7r] such that a(2/c7r) = (2A; + l)-7r, a(Mfc) = M&, 
and 

(x + y?) sin x = (a(x) + 9?) sin a(x) (19) 

for all x G [2&7T, M fc ]. Let 

7(t) := f cos(t — (p) — [p + a(i — tp)] cos(a(t — tp)) , t £ [tp + 2k-K, tp + Mfc] . 

Then, 

j(<p + 2/c7r) = (p + 2/c7r) cos(2/c7r) — (p + 2£;7r + 7r) cos(2£;7r + ir) = 2{tp + 2kn) + 7r > r 

and 

7 (^ + M fc ) = (tp + Mfc) cos Mfc - + Mfc) cos M k = . 

So, since 7 is continuous, there is some to £ [p + 2k-K, tp + Mk] such that 7(^0) = r, which means 
that 

to cos(to — tp) — so cos(so — p) = r 
with so := tp + a(to — tp). Moreover, evaluating ( |l9| ) at x = to — (ft gives that 

t sin(t - p) = s sin(s - tp) , 
and the proof is now complete. I 

Lemma 3.4 Consider the following subset of M 3 : 

TZ = {(C,6 € C x R I C = e ir , £ = e r sin2r, r > 0} . 

Then, for each w £ C x R there exist two elements uj\, UJ2 £Tt and a real number \ > such 
that wi — 0J2 = X w - That is, as a subset of R 3 , sec(lZ) = S 2 . 

Proof. We will use the real functions 

g(x) := e x sin2x and f(x):=g(x — ir/2). 

Representing complex numbers £ in the form re lv , with r not necessarily positive but the 
argument tp restricted to [0, 7r), we may rephrase the claim of the lemma as follows: for each 
r € R, < tp < ir, and there exist x > and t, s > such that 



^(f(t) - /(*)) 



P 
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and we may assume that w = (r,p) ^ 0, since when ro = 0we may pick any x an d 0J\ = UJ2 = 
any element of 1Z. Equality of the first (complex) components in ( |20| ) amounts to asking that 
the following two equations hold: 

s'm(t — cp) = sin(s — ip) , cos(i — ip) — cos(s — ip) = xr , (21) 

and for this, in turn, it is sufficient to find t,s £M. such that: 

t + S = 7T + 2lf , COs(-7T + if — s) — COs(s — if) = X r ■ 

The last of these equations simplifies to — cos(s — ip) = x r /2- In summary, for any given r, ip,p 
we must find two real numbers a and s such that (absorbing — e _7F//2 into p): 

f{ir + 2tp - s) - f(s) =p, - cos(s -ip) = xr/2 . 

Or, letting 9 := s — ip — tt/2, and in terms of g(x) = f(x + vr/2), the problem is to solve the 
following two simultaneous equations for 0: 

g(p + 9)-g(ip-9) = X P, sin0 = X r/2. (22) 

In the special case that r = 0, we have p / (recall (r, p) / 0), so we may pick 9 := tt and 
X '■= [5(93 + 7r) — g((p — tt)]/p- Thus, we assume r / from now on. 

We will show that there is some 9 which is not of the form kir for any integer k, such that 

g{<p + 9) - g{^ - 9) 2p 



sin ( 



(23) 



Once such a 9 is found, we may simply let x := (2/r)sin0, from which it follows that sin0 = 
X"r 12 and g((j) + 9) — g(ip — 9) = {2/r)psm.9 = XP, so ( p2[ ) holds as desired. 

Letting q := (2p/r)e _v ', we restate our goal as that of solving 

e e sin(2(y + g))-e- e sin(2(y-g)) = 
sin 9 

for q. The idea is to take 9 ^> 0, so that the first term in the numerator dominates. We consider 
three cases: (i) < ip < vr/2, (ii) ip = 0, and (iii) ir/2 < ip < n. The last case follows from 
the first two, since given any ip in the interval [vr/2, tt), we may solve the version of (|24|) stated 
for ip := ip — tt/2 instead of ip and — q instead of q, and the same 9 then solves ( P4|) (since 
sin(2(<£ + 0)) = - sin(2(v9 + 9)) and sm(2(0 - 0)) = - sin(2(y> - 0))). 

Case (%): We introduce the functions A(0) := e e sin(2((^ + 0)) and 5(0) := e~ e sin(2(</> - 0)), 
and note that |B(0)| < 1 for > 0. Pick 



a 



and observe that 

4a < </? < vr/2 - 4a (25) 
(from which it also follows that a < tt/16). Let 

ai ■ = e 2 "-^/ 2 S in(2v9 + 4a - vr) , a 2 := e~ 2a sin(2^ - 4a) . 
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Observe that ip < ir/2 — 4a implies that 

— 7r < 2ip + 4a - 7T < -4a < 
so ox < 0, and that 4a < ip implies 

< 4a < 2(p - 4a < 2(p < ir 

so a 2 > 0. Now pick an odd integer k large enough so that 

e fe7 Vi + 1 , e k7T a 2 - 1 

< q and — — > q 

cos za sin 2a 

and introduce 9\ := kir — 7r/2 + 2a, 02 := kir — 2a, and the interval I := #2]- On this interval, 
sin# is decreasing and positive; in fact it satisfies sin 2a = sin #2 < s'mO < sin#i = cos 2a. In 
particular, the function 

A(0)+B(6) 



C{9) :-- 



sint/ 



is well-defined and continuous on /. Moreover, since A{6\) = e cr\, A(9 2 ) = e o~ 2 , arm 
< 1 for all 9, and because of the choice of k, 

C(0i) < q and C(0 2 ) > q 

so we conclude that, indeed, we can solve C{9) = q. 

Case (ii): If ip = 0, then fl2|) reduces to e9 sin2 ^7 n V - sin26 = q or equivalently 

2e e cos 6> - 2e~ e cos 9 = q. 

We pick a positive integer large enough so that 

-V2e 2k7r+3 ^ + 2 < q and V2e 2k ^' A - 2 >q . 

Now consider the function C{9) = 2e e cos9 — 2e _e cos0 on the interval / = [0i,#2] = [2kir + 
7r/4, 2kir + 37r/4]. We have that C{9\) > q and C{9 2 ) < q, so we can again solve C{9) = q. I 



Corollary 3.5 For any fixed positive integer r, consider the subset 1Z r = S r 1 x 1Z of M 2r+1 . 
Then sec(ft r ) = S 2r . 

Proof. Take any 9 £ S 2r , and write it in the form {z\, . . . , z r -±,w) with z% £ C and to £ C x 1. 



Using Lemma 3^, we pick u>i,u> 2 6 7^ and x > such that uj\ — uj 2 = x w - Next, using 
Lemma 3J3, we find Qj £ S, i = 1, ... ,r — 1, j = 1,2, such that Cn ~ Ci2 = X z i f° r each 
i = 1, . . . , r — 1. So chj := (Cij, • • • , Cr-ljj ^j) £ for j = 1,2 satisfy a\ — a 2 = \9- Since 9 has 
unit norm, it follows that \ = l°i ~~ a 2|, so 9 = i ai ~ a2 i £ sec(7£ r ). I 
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3.2 The Bound 2r+l is Best Possible 



We now present an example which shows that the number 2r+l in Theorem ^ cannot be lowered. 
For this, we must exhibit, for each positive integer r, an analytic response j3 with the property 
that G/3 t 2r either is not generic or has less than full measure in U 2r . In fact, we will show far 
more: we will show that Gp^r is empty. 

The example is as follows. Given any fixed r, we consider the mapping 

g : [0, oo)^ 1 x (0, oo) -> R 2r+1 : (h, . . . , t r ) ^ (he Ul , . . . , tr-ie**' 1 , e itr , e tr sin 2t r ) 

whose image is 1Z T (note that (1,0,0) can be obtained as (e ltr ,e tr sin2t r ) for t r = 2tt, so t r = 
is not required), and let / : X = W' 1 x (0,oo) -> U = R 2r+1 be given by f(h,...,t r ) = 
g(tf, . . . ,t 2 _ 1: t r ), so also /(X) = lZ r . We let /3(x,u) = f(x) ■ u. By Proposition |3.1| , if there 
is a universal distinguishing set of cardinality 2r then sec(7£ r ) ^ S 2r (note that m — 1 = 2r). 
This contradicts Corollary |3.5| , so no such set can exist. I 

We can modify this example so that the input set U is scalar, as follows. Let us consider 
the following response, with U = R: 

0(x,u) := 0(x,ip(u)) = f(x)-ip(u) 

where tp : R -> R 2r+1 :«w(l , u, u 2 , . . . , u 2r ), leaving / and X unchanged. We claim that there 
is no universal distinguishing set of cardinality 2r. Indeed, suppose that Uo would be a 2r- 
element universal distinguishing set. Consider the set Uo := ^(Uo). As this set has 2r elements, 
it cannot be a universal distinguishing set for (3. Thus, there exist two parameters x\ and x-i 
which are distinguishable for j3, that is f{x{) / fix?), but such that f3(x\,v) = (5{x2,v) for all 
v £ Uo, which implies f}{x\,u) = (3(x2,u) for all u G Uo- If we show that (3(xi,u) ^ f3(x2,u) for 
some m6D then we will have a contradiction with Uo being a universal distinguishing set. To 
see this, simply notice that ip(M.) linearly spans M 2r+1 : if the vector a £ R 2r+1 is nonzero, then 
a ■ ip(u) / for some u G R (Yl a i u% = =^ aj = Vi), and apply with a = f{x\) — f{x2)- ■ 



4 Application to Systems 

Now we apply the results about abstract responses to the special case of identifying parameters 
in systems, proving Theorem |l|. We take an analytically parametrized system E, with r = dimX. 
When the number of measurements p = 1, the results follows from Theorem ||| applied to (3 = /?£. 

For the general case, we consider the scalar responses /3^, % = 1, . . . ,p which are obtained as 
coordinate projections of jfe. We claim that, for each fixed q, and with the obvious notations, 
Dj G\ &q- Indeed, take any w G P) i Q l q , and any xi,X2 such that x% ^ X2- Then there must 
be some i G {1, •••,(?} such that sci 9^ ^2 for the response {3%, so, since w G it follows that 
/3^(xi,w) 7^ /3|;(x2,w), and therefore also Pt,(xi,w) ^ (3t,(x2,w). This proves that w G Q q . 
Since the intersection of a finite (or even countable) number of generic and full measure sets 
is again generic and of full measure, Q q must have this property. This completes the proof of 
Theorem [l| for arbitrary p. I 



4.1 A System for which 2r+l Experiments are Needed 



We can express the responses f3 or (3 from Section 3.2 as the response /?£ for an analytically 
parametrized system, and in this way know that the number 2r+l in Theorem [l] cannot be 
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lowered to 2r, and in fact, there are analytic systems with r parameters for which there is not 
even a single universal distinguishing set of cardinality 2r. The simplest S would be obtained 
by using any /, and just defining h(z, u, x) = (3(x, u). It is far more interesting, however, to give 
an example where only polynomials appear in the system description and h does not depend 
directly on u and x. We do this explicitly for the case r = 1; the case of arbitrary r is entirely 
analogous. 

The system X that we construct has state space M = R 9 , input-value space U = R, 
parameter space X = (0, oo), experiment space A = R, and p = 1, and is given by: 

X : (0, oo) R 9 : a ^ (a, 1, 0, 0, 1, 0, 1, 0, 1) , 

h : R 9 -> R : (z 1} z 2 , . . . , zg) i-> z 2 z 5 + z 3 z 6 + z^gzjg 
(independent of u and x), 

|i:lxK^I: (A, t) i— ► A 
(i.e., inputs are constant scalars), and 

f{z,u,x) := (0,0,uz 2 ,2uz 3 ,-z 1 z 6 ,z±z 5 ,-2ziz 8 ,2z 1 z 7 ,z 1 z 9 ) . 

The solution z(t) with initial condition z(0) = x( a ) an d input u(t) = A is the following vector: 

(a, 1, At, X 2 t 2 , cos at, sin at, cos 2at, sin 2at, e at ) 

and therefore 

fc(a,\) = h(z(l)) = <p(a)-iP(\) 
where cp(a) = (cos a, sin a, e a sin 2a) and ^(A) = (1, A, A 2 ), which is /3(a, A). I 

A small modification of this example has h linear: just add an additional variable z\q with 
initial condition ^io(0) = and satisfying z\q = {z 2 z§ + z^Zq + z&z%z§)' (written, using z 3 = uz 2 , 
etc, in terms of the Z{ and u), and now use h(z) = z\q. 



5 Distinguishability in the Operon Example 

We now show, for the operon system (|5|) with external input, that every two distinct parameters 
are distinguishable. We work out such an example in order to emphasize that the problem of 
determining identifiability is nontrivial, which makes it more interesting that Theorem |] applies 
without this knowledge. The experiments (A, T) consist of using constant inputs u(t) = A for 
varying intervals [0, T] and measuring M(T). Thus, we wish to prove: if for every nonnegative 
A, the solution (M{t),E(t)) of M = E m /(1 + E m ) - aM, E = M — bE — XE, with initial 
condition (M ,E ), and the solution (M+(t),^(t)) of Aft = (£t) mt /(l + (J£t) mt ) - a)M\ 
£Jt = Jkft - 6t£t _ \e\ with initial condition (M^ ,E^ Q ) are such that M(t) = Aft(t), then 
necessarily Mq = M'q, Eq = E*o, m = rrv , a = aJ , and b = b^ . 

Assume given two parameters (Mo, Eo,m, a, b) and (Aft , E^q, rrv, a\ fet) with this property 
(recall that the entries are all positive, by assumption). Since M(t) = Aft(t), of course Mq = 
Aftg, and we write £ for their common value. Now fix an arbitrary A and look at Af(l) and 
Aft(l). We have that Af(l) = e -a £ + J* e _a(1 -* ) a(t) dt, where we define a(t) = j^^r and 
note that a(t) < 1 for all t. It follows that Af(l) < 1 + £ is bounded independently of the value 
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of A. On the other hand, E(t) = e^+^Eo + jj e -( b + A )(*-^) M(s) ds. Since e -i b +m~s) ^ o as 
A — > oo for each s < t, and M is bounded, we have by dominated convergence that E(t) — > as 
A — ► oo, for each fixed t. Thus also a(t) — > as A — ► oo, for each fixed t. Now applying this to 
the above formula for M(l), and again by dominated convergence, we have that M(l) — * e~ a £ 
as A — > oo. Since M(l) = M'(l) for any given A, and since by an analogous argument we also 
have that Aft(l) -> e~° £ as A — > oo, we conclude that e a £ = e a £, and therefore that a = a' . 

Prom the original differential equation M = prgm —aM, we know that a = M+aM, and we 
also have the same formula (a is the same, and M too) for the second set of parameters, which 
gives us that a(t) = c^(t), and therefore (since p — > is one to one) that E m (t) = (E^) mt (t) 
for all t, no matter what input A is used. For any given A, we introduce the function 

w (t) = \—}—^-E m {t) 

and similarly for the second set of parameters. As E m = (E^) m \ also w = vft . Calculating, we 
have that w = jg(M — bE) — m, and similarly for vr . Thus we obtain, evaluating at t = 0: 



(t-bEo)-m = — r (Z-tfE\ 



XEq XE^q 

and taking now the limit as A — > oo we conclude that m = rrv. Thus, since now we know 
that E m = (E^) m , we can conclude that E = E^ and in particular that Eg = E^q and 
that dE/dt = dE^/dt (for any given value of the input A). Finally, for A = we have from 
E = M - bE that b = fet. I 



6 Comments and Relations to Other Work 

We close with some general comments. 
6.1 Universal Distinguishing Sets 

The concepts of distinguishability and distinguishing sets are common in several fields. In 



control theory (see e.g. |25 |, Chapter 6), one studies the possibility of separating internal states 



(corresponding to parameters in the current context) on the basis of input/output experiments. 



The papers f23| , 24 1 deal with applications of distinguishability to the study of local minima of 
least-squares error functions, and set-shattering in the sense of Vapnik and Chervonenkis, for 
artificial neural networks. More combinatorial, but essentially the same, notions, have appeared 
in computational learning theory (a teaching set is one which allows a "teacher" to uniquely 
specify the particular function being "taught" among all other functions of interest, see e.g. |§) 
and in the theory of experiments in automata and sequential machine theory (cf. ||). 



6.2 Observability 

The observability problem, that is, the reconstruction of all internal states of the system, 
is included in the problem discussed here, in the following sense. Suppose that parameters 
include all initial states, that is, X = M x Xq and the initial state x 1S a projection onto the 
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components in M (as in the examples in Section LI). Then distinguishability of parameters 
implies distinguishability of initial states, and, since the flow of a differential equation induces 
a group of diffeomorphisms (so the map z(0) i— > z(T) is one-to-one, for each T and each given 
input), also the distinguishability of states at any future time. 



6.3 Restarting 

Notice an important feature of the setup. Since the objective is to find parameters, and these are 
constant, it is implicitly assumed that one may "restart" different experiments at the same initial 
state. In practice, this may or may not be a valid assumption. In fact, much work in control 
theory deals with identification problems for which one need not restart the system: this is the 
subject of the area of universal inputs for observability, cf. [12, 26, 3C[. On the other hand, 



in molecular biology multiple experiments, assuming identical initial conditions, are usually 
performed by careful assay controls, or by dealing with synchronized daughter cells. Indeed, 
because of the noisiness inherent in biological applications, data for a "single experiment" may 
actually represent an average of different runs under the same (approximate) conditions. In 
addition, many measurements in cell biology are destructive, and thus is impossible to take 
measurements at different times from the same cell, so the theory of universal inputs does not 
apply under such circumstances. 



6.4 Genericity 



The material in Section 2.3 on genericity is motivated by, and shares many of the techniques 



with, the theory of manifold embeddings (see also Section 6^ below, as well as the remark in 
the proof about one to one maps) . Closely related is also the work of Takens [ 28 1 , which shows 
that generically, a smooth dynamical system on an r-dimensional manifold can be embedded 
in lR 2r+1 , as well as the control-theory work of Aeyels on generic observability, which shows 
in 0| that for generic vector fields and observation maps on an r-dimensional manifold, 2r+l 
observations at randomly chosen times are enough for observability, and in Q that this bound is 
best possible. Aeyels proofs, in particular, are based on transversality arguments of the general 
type that we use. 



6.5 The Examples 

In Lemma |3.4| , if instead of 1Z we would have considered the set consisting of those (£, £) G C x R 
with C = e ir and £ = sin2r, then sec (72.) would be a proper subset of S 2 . Indeed, this set was 
studied in where it was shown to have nonzero measure and a complement also of nonzero 
measure. 



6.6 Whitney's Embedding Theorem 

The material in Section || is closely related to the proof of the "easy version" of Whitney's 
Embedding Theorem, cf. [|l(], p4j| . We briefly review this connection here. 

We suppose that S is a compact r-dimensional embedded submanifold of U = IR m . We 
assume that m > 2r+l (otherwise what follows is not interesting.) The dimension r may well 
be smaller than the dimension of X. Of course, there is no reason whatsoever for S to be a 
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submanifold of R m , as the mapping / may well have singularities. Thus, we are imposing yet 
another condition besides linearity on u. On the other hand, analyticity of /, i.e. analyticity of 
(3 on x, is not needed in what follows. 

The facts that there exist universal distinguishing sets Uo of cardinality 2r+l and, moreover, 
that a Uo has this property are, in the special case being considered here, an immediate con- 
sequence of the proof of the "easy version" of Whitney's Embedding Theorem. (The classical 
embedding results date back to Menger's 1926 work (cf. for continuous functions and maps 
from topological spaces into Euclidean spaces, and the smooth version dealing with embeddings 
of differentiable manifolds of dimension r in M 2r+1 due to Whitney in A "harder" version 
of Whitney's theorem (cf. |52|) shows that one may embed such manifolds in M. 2r as well, and 
locally embed (immerse) in M 2r_1 when r > 1, see [Q.) 

Let us briefly sketch how this conclusion is obtained. We consider first the special case 
m = 2r + 2. The universal distinguishing sets consisting of 2r+l linearly independent vectors 
are in a one-to-one correspondence (up to a choice of basis) with the possible 2r+l-dimensional 
subspaces V of W 71 for which sec(S') f] u(V- L ) = 0, or equivalently with the unit vectors u 6 M m 
which do not belong to sec(S'). (Note that u G sec(5) if and only if — u £ sec(S), so there is 
no need to work with projective space in dimension m — 1, and we may simply deal with unit 
vectors.) Thus, one needs to show that S m_1 \ sec(S) is generic and of full measure. Now, 
sec (S) is the image of the differentiable mapping 

m— 1 



S = {(a, b) G S z | a ^ b} -» S™" 1 : (a, b) 



b\ 



and S has dimension 2r < m — 1 = dimension of §> m_1 . Thus, the Morse-Sard Theorem says 
that sec(S) has measure zero and is included in a countable union of closed nowhere dense sets, 
as wanted. The general case (m > 2r + 3) can be obtained inductively, by iteratively reducing 
to a smaller dimensional embedding space by means of projections along vectors u picked as 
in the previous discussion, with a small modification: the choice of u has to be made with 
some care, requiring in addition that all tangents to S miss u; when doing so, the projection 



of S has a manifold structure and the argument can indeed be repeated. See [14| for details, 
and also Jl for an expository discussion of these ideas in the context of numerical algorithms 
which optimize the projections; the generalization of the material in this last reference, to cover 
special classes of nonlinear parametrizations, would be of great interest. 



6.7 The Techniques 

As we mentioned, the main result is based on the facts regarding analytic functions which we 
developed in our previous paper ^3|. This is in contrast to work based on Whitney embeddings 
and transversality arguments. Quite related, on the other hand, is the recent (and independent 
of [^3| ) work [l8| , which deals with the distinguishability of fluid flows on the basis of a finite 
set of exact experiments: the authors provide a bound of the form "16r + 1" measurements (the 
number arises from the need to obtain appropriate parametrizations), where r is the dimension 
of an attractor for the systems being studied, and they also employ analytic-function techniques 
in a manner very similar to ours. 
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6.8 Least Squares 



We have not discussed the actual numerical computation of parameters on the basis of experi- 
ments, which is of course a most important direction of study. What we can say is the following: 
if all distinct parameters are distinguishable, then, for a generic set of 2r+l experiments, global 
minimization of the least-squares fit error function will result in a unique global minimum. But 
nothing is said about local minima nor, certainly, the effect of noise. (To study such effects, one 
will have to combine techniques as here with classical statistical tools, such as the Cramer- Rao 
inequality for the Fisher information matrix, which lower-bounds the covariance of any unbiased 
estimator. However, our global results are in any case of a rather different nature than these 
classical statistical techniques, which are closer to the nonsingular case treated in Section |2,2| .) 
Note that even if not all parameters are distinguishable, the results in this paper might still be 
useful, see p3l for related work. 



6.9 Vector Outputs 



The statement of the main theorems notwithstanding, the results are really about scalar mea- 
surements, in the sense that the number of simultaneous measurements p does not enter into the 
estimates. This is unavoidable: for instance, if all coordinates of h happen to be the same, no 
additional information can be gained. It would be of interest to come up with a natural (and 
verifiable) condition of independence which, when incorporated into the system description, 
would allow one to introduce a factor 1/p into the estimates. It is fairly obvious how to do such 
a thing with abstract responses and if there are enough input dimensions (m > p): provided 
that independence implies that the codimensions of the sets U(xi,X2) is p instead of 1, the 
critical inequality (2r+l)(m— 1) + 2r < (2r+l)m becomes (^+l)(m—p) + 2r < (^+l)m, so 

^-+1 randomly chosen experiments suffice. But in the case of systems, and even for abstract 
responses with low-dimensional U, how to state a good result is less clear. 



6.10 Structure 



The problem of structure determination, that is to say, finding the form of equations, can some- 
times be reduced to the problem studied here. Specifically, it usually happens in applications 
that one merely wishes to know if a particular term appears or not in the description of a 
differential equation. As an illustration, take the following situation in molecular biology: it 
is not known if a variable, let us say z±, affects or not the evolution of another variable, let 
us say Z2, but it is known that, if there is any effect at all, then this influence takes the form 
of an inhibitory feedback term c ^ Z\ appearing in the equation for z\. One reduces to the 
identification problem by thinking of "c" as a parameter; c = corresponds to no effect. Given 
that the "hypothesis testing" problem "determine if c = or c ^ 0" is less demanding than the 
problem of actually finding the value of c, it is not surprising that less than 2r+l experiments 
are required to settle this matter. In formal terms, one can prove that distinguishability of 
parameter vectors x from a fixed vector xq can be attained by means of randomly chosen sets 
of r + 1 experiments. The proof of this fact is entirely analogous to the one given for our main 
theorem; the only difference is that in the definition of the sets Vp A , we can simply look at 
elements (x, w) E (X \ {^o}) X V^ q \ and this has dimension qm + r — q, so using q = r + 1 we 
obtain a projection Bp^+i, now defined in terms of existence of x not equivalent to xq, and this 
is a union of manifolds of positive codimension in U^ r+1 ^ . 
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